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ABSTRACT 

Monte  Carlo  calculations  often  require  generation  of  a  random  sample  of  n- 
dimensional  points  drawn  from  a  specified  multivariate  probability  distribution. 
We  present  an  importance  sampling  technique  that  can  often  greatly  improve 
the  efficiency  of  an  acceptance/rejection  generating  method.  The  importance 
sampling  function  is  defined  as  piecewise  constant  on  a  set  of  subregions,  which 
are  obtained  by  adaptively  partitioning  the  sampling  region  so  that  the  variation 
of  density  values  within  each  subregion  is  relatively  small.  The  partitioning 
strategy  is  based  on  multiparameter  optimization  to  estimate  the  maximum  and 
minimum  of  the  original  density  function  in  each  subregion. 
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1.  Introduction 


In  Monte  Carlo  calculations,  it  is  often  necessary  to  generate  a  random  sample 
of  n-dimensional  points  drawn  from  a  specified  multivariate  probability  density 
function  f(x).  Such  sampling  can  not  generally  be  done  directly,  except  in  a  few 
important  special  cases  (see,  e.g.,  Everett  and  Cashwell,  1972). 

When  direct  sampling  is  not  possible,  a  method  that  can  always  be  applied 
(at  least  in  principle)  is  the  acceptance/rejection  technique.  Let  R  denote  the 
sampling  region  of  interest  (which  will  be  assumed  to  be  hyper-rectangular),  and 
let  fuB  be  an  upper  bound  on  the  value  of  /  in  R.  The  sample  is  generated 
by  a  sequence  of  trials,  as  follows.  A  random  number  p  is  uniformly  sampled 
in  the  interval  [0,  Jub\>  and  a  random  point  r  is  sampled  with  uniform  density 
within  the  sampling  region  R.  If  /(r)  >  p,  the  point  r  is  accepted;  otherwise, 
it  is  rejected.  With  this  technique,  the  accepted  points  constitute  the  desired 
random  sample  drawn  from  the  probability  density  function  /. 

The  efficiency  of  an  acceptance/rejection  procedure  is  defined  as  the  ex¬ 
pected  number  of  points  accepted  per  function  evaluation,  and  is  given  by 

/ 


e  = 


Jub‘ 


(1) 


where  /  is  the  mean  of  /  over  the  sampling  region.  From  (1),  it  can  be  seen  that 
the  efficiency  improves  with  the  quality  of  the  upper  bound.  However,  even  at 
best  a  large  number  of  function  evaluations  may  be  required  for  every  point  that 
is  accepted. 

An  improved  efficiency  can  be  achieved  if  a  lower  bound  Jlb  for  the  function 
/  in  R  is  also  known.  If  p  <  f^B,  the  corresponding  point  r  can  be  accepted 
without  evaluating  /.  In  this  case,  the  efficiency  is  defined  as  the  ratio  of  the 
probability  of  acceptance  to  the  probability  of  a  function  evaluation,  and  is  given 

by 


e  = 


fuB  —  f LB 


(2) 


The  optimal  efficiency  t  is  achieved  when  the  upper  and  lower  bounds  are 
attainable,  i.e.,  when  Jub  =  /"*"  (the  maximum  of  /  in  R)  and  /lb  —  f~ 
(the  minimum  of  /  in  R).  Thus, 

'*  =  /+  -  /-  =  (/+  -  /— )T0l(R)-  (3) 

where  I  is  the  integral  of  /  over  R,  and  vol(R)  is  the  volume  of  R  (by  definition, 
/  =  /vol(f?)). 

2.  An  Importance  Sampling  Technique 

The  technique  of  importance  sampling  can  sometimes  improve  the  efficiency 
of  sample  generation.  The  idea  is  to  find  a  second  probability  density  func¬ 
tion  g(x)  —  an  importance  sampling  function  —  for  which  the  trial  points  in 
R  can  be  generated  efficiently.  Points  with  the  original  probability  density 
function  f(x)  can  then  be  produced  by  using  the  ratio  function  f{x)/g{x)  in 
an  acceptance/rejection  procedure  (with  a  suitable  choice  for  the  interval  from 
which  p  is  drawn).  For  an  appropriate  g{x),  importance  sampling  can  yield  sub¬ 
stantial  improvements  in  efficiency  compared  to  simple  acceptance/rejection. 

In  practice  it  is  usually  difficult  to  find  an  effective  importance  sampling 
function  g(x)  for  the  particular  problem  at  hand.  We  present  an  adaptive 
algorithm  that,  given  f(x),  constructs  such  a  suitable  g{x).  For  this  purpose,  we 
initially  make  two  assumptions:  (i)  that  R  has  been  partitioned  into  a  set  of  M 
disjoint  simply-bounded  subregions  {/?,-}, »  =  1,...,M,  such  that  R  —  U<R<; 
and  (ii)  that  and  f~  (the  maximum  and  minimum  of  /  in  R»)  are  known  for 
each  R{.  We  then  define  an  importance  sampling  function  ?(x)  as  the  piecewise 
constant  function 

m  =  ut  i  *  e  «()•  («) 


In  order  for  J  to  be  useful  as  an  importance  sampling  function,  it  must  be 
possible  to  generate  random  sample  points  with  this  density  function.  This  is 


achieved  by  randomly  choosing  a  region  A,-  with  probability 

/+to1(R,) 

P* - - 1  (5) 

E  ft  T0'(R;) 

;=i 

and  then  generating  a  random  point  r  in  A,-  with  uniform  density.  Applying  the 
accept/reject  procedure,  a  random  number  p  is  generated  in  the  interval  [0, 1] 
with  uniform  density,  and  the  point  r  is  accepted  if  /(r)//+  >  p.  Note  that 
» .  <  nut.  r  is  accepted  without  evaluating  /.  The  accepted  points  are  a 
random  sample  from  f(x). 

In  order  to  calculate  the  efficiency  of  an  importance  sampling  proecdure 
based  on  g{x),  we  first  consider  the  probability  of  acceptance  in  each  region.  The 
probability  of  choosing  the  t-th  region  is,  by  construction,  j>,-  (5).  Given  that  A,- 
has  been  chosen,  the  probability  of  accepting  a  point  r  uniformly  chosen  in  A* 
is  Ji/ff,  where  /,  is  the  mean  of  /  in  A,-.  Hence,  the  overall  probability  of  ac¬ 
ceptance  (denoted  by  Pr(A))  is  the  weighted  sum  of  the  acceptance  probabilities 
for  all  the  subregions,  i.e. 


»h>-  /T0|(RJ-A 

i_1  U  —£  /+„!(«,)  ’< 

r  :_1 


£/+to1(S,) 

1=1 

where  I  is  the  integral  of  /  over  the  entire  region  A. 

The  probability  of  a  function  evaluation,  denoted  by  Pr(FE),  is  the  weighted 
sum  of  probabilities  that  /(r)  exceeds  given  that  r  is  in  A,-,  i.e. 

*  E vt-nwm 

Pr{FE)  =  f>/<  - . 

7‘  E/^oKfi.) 
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The  efficiency  of  the  importance  sampling  procedure  (denoted  by  e1B),  is 
then  Pr(A)/Pr{FE),  or 


I 


£(/?•- /n™i(R.) 


i=i 

If  we  define  the  spread  associated  with  the  t'-th  region  as 


(6) 


Si  =  Vt-t7)™ 'to).  CO 


then  (6)  can  be  written  as 

(8) 

*=  l 

The  efficiency  t,B  of  the  importance  sampling  based  on  ?(z)  will  exceed  the 
efficiency  e*  (3)  of  the  original  acceptance/rejection  procedure  if  any  Si  is  less 
than  the  spread  of  the  entire  sampling  region  R.  Furthermore,  the  efficiency 
increases  as  the  sum  of  subregion  spreads  becomes  smaller. 

This  analysis  suggests  that  an  importance  sampling  procedure  based  on  § 
could  be  extremely  effective  if  the  sum  of  spread  measures  over  {R,}  is  small. 
In  order  for  this  observation  to  useful,  we  must  show  how  to  contract  a  set  of 
subregions  {R*}  such  that  £  S<  is  small.  In  the  next  section,  an  adaptive  par¬ 
titioning  technique  will  be  described  that  produces  a  suitable  set  of  subregions. 


3.  The  Adaptive  Refinement  Procedure 

The  adaptive  partitioning  procedure  used  to  construct  the  subregions  {JR,-}  was 
originally  developed  in  the  context  of  multidimensional  quadrature  (Friedman 
and  Wright,  1981a),  and  is  intended  to  produce  a  set  of  subregions  with  “similar” 
spread  measures  (i.e.,  to  minimise  the  sum  of  spreads).  Since  this  is  precisely 
the  quality  sought  in  the  case  of  importance  sampling,  the  same  procedure  can 


s 


be  applied  to  construct  £(z).  This  section  gives  only  a  brief  summary  of  the 
adaptive  refinement  procedure.  A  detailed  description  is  given  in  Friedman  and 
Wright  (i981a);  the  associated  software  is  documented  in  Friedman  and  Wright 
(1981b). 

Consider  a  typical  hyper-rectangular  region  R,  defined  by  simple  bounds  on 
each  coordinate: 

R  =  {x\x\  <  *,<  x?},  (9) 

where  x  is  the  vector  (*i,X2»-*-i*n)r-  Let  S(R)  denote  the  spread  measure 
associated  with  R: 

S(«)  =  (/+-/“)vol(R),  (10) 

where  /+  and  f~~  denote  the  maximum  and  minimum  of  /  in  R.  The  implemen¬ 
tation  of  the  adaptive  strategy  for  partitioning  R  includes  three  steps: 

(1)  calculation  of  the  spread  measure  S(R ); 

(2)  subdivision  of  R; 

(3)  processing  the  new  subregions  and  terminating  the  partitioning  when 
appropriate. 

In  order  to  obtain  S(R),  the  volume  of  R  and  the  extrema  of  /  in  R  must 
be  computed.  Because  of  the  simply-bounded  form  (9)  of  K,  the  volume  is  easy 
to  calculate: 

vol(f?)=  ]J(af —  *•'). 

<=i 

This  simple  expression  for  the  volume  is  the  main  reason  for  partitioning  R  into 
simply- bounded  subregions;  even  for  a  region  defined  by  general  hyperplanes,  cal¬ 
culation  of  the  volume  is  extremely  complicated.  Furthermore,  efficient  uniform 
sampling  within  a  region  of  the  form  (9)  is  straightforward. 

The  other  term  in  the  expression  (10)  might  seem,  at  first  glance,  to  cause 
calculation  of  the  spread  to  be  computationally  intractable,  since  two  optimisa¬ 
tion  problems  must  be  solved.  However,  methods  for  optimisation  problems  with 


simple  bounds  on  the  variables  are  well  developed,  and  thus  the  sub-problems 
associated  with  (10)  can  be  solved  quite  efficiently  if  /  is  a  reasonable  function. 
(A  non-derivative  quasi-Newton  method  for  bound-constrained  optimization  is 
used  to  compute  the  extrema  of  /  in  R;  for  details,  see  Friedman  and  Wright, 
1981a.)  More  importantly,  the  substantial  efficiencies  in  sampling  that  result 
from  a  well  constructed  partition  ultimately  justify  the  function  evaluations  ex¬ 
pended  to  compute  the  extrema. 

The  second  portion  of  the  partitioning  algorithm  involves  dividing  R  into 
disjoint  simply- bounded  subregions  such  that  the  sum  of  spread  measures  is 
reduced.  Let  z+  denote  the  value  of  x  corresponding  to  /+,  and  x~  denote  the 
value  of  x  corresponding  to  f~.  An  “ideal”  partitioning  strategy  might  “split”  R 
into  two  disjoint  subregions:  R+,  which  contains  i+,  and  R~ ,  which  contains 
x~ .  These  subregions  would  be  separated  by  the  boundary  of  points  x  such  that 
/( x)  —  /,  where  /  is  chosen  so  that  the  spread  measures  in  i?+  and  R~~  are 
equal,  i.e. 

(/+  -  /)T0l(R+)  =  (/  -  /-) yoI(R-).  (11) 

The  strategy  just  described  is  impractical  —  not  only  because  of  the  difficulties 
in  defining  the  boundary  between  /?+  and  R~ ,  but  also  because  the  subregions 
could  be  of  arbitrary  shape  (which  makes  calculation  of  their  volumes  intract¬ 
able).  Therefore,  our  partitioning  algorithm  uses  a  similar,  but  simplified  motiva¬ 
tion,  to  construct  a  set  of  simply  bounded  subregions.  In  particular,  the  bound¬ 
aries  of  the  subregions  are  defined  by  “cuts”  along  each  coordinate  direction 
from  one  of  the  extrema.  The  desired  cuts  are  the  solution  of  a  system  of  non¬ 
linear  equations  similar  to  (11),  and  can  be  computed  efficiently  using  a  special 
secant-like  method  for  solving  the  associated  nonlinear  system  (see  Friedman  and 
Wright,  1981a). 

Finally,  after  R  has  been  partitioned,  the  daughter  subregions  are  merged 
into  the  list  of  all  regions.  If  the  global  stopping  criteria  are  satisfied  (e.g.,  the 


overall  sum  of  spread  measures  is  sufficiently  small)  the  partitioning  terminates. 
Otherwise,  the  list  of  regions  is  scanned  for  the  one  with  the  largest  spread 
measure,  which  is  then  considered  for  refinement  at  the  next  iteration. 

4.  Efficiency  of  the  Adaptive  Importance  Sampling  Procedure 

The  procedure  described  in  Section  3  can  be  applied  to  define  ff(x)  (4)  by  con¬ 
structing  a  set  of  subregions  along  with  the  corresponding  values  {/+  };  the 

resulting  <y(z)  can  then  be  used  in  importance  sampling  as  described  in  Section 
2.  This  technique  will  be  effective  to  the  extent  that  the  function  evaluations 
needed  to  construct  the  partition  lead  to  a  sufficiently  improved  efficiency  in  the 
importance  sampling. 

Let  Np  denote  the  number  of  function  evaluations  expended  to  construct 
the  partition  of  R.  Although  tis  (8)  will  tend  to  increase  monotonically  with 
Np,  the  relationship  obviously  depends  on  /.  It  has  been  found  empirically  that 
tIB  grows  very  rapidly  with  increasing  Np  for  small  values  (Np  <  15, 000),  and 
then  slows  down  to  a  nearly  linear  dependence  for  larger  values. 

Let  Ng  denote  the  number  of  points  to  be  sampled.  The  efficiency  of  the 
suggested  adaptive  importance  sampling  procedure  is  given  by 

E  —  = _ — _  fl2l 

Np  +  N./e, „  1  +(Np/N.)el8'  K  1 

As  we  would  expect,  E  is  always  less  than  e1B,  but  will  approach  e„  as  Ne 
increases. 

From  (12),  we  see  that  the  number  of  function  evaluations  that  should  be 
expended  to  perform  the  partitioning  in  order  to  obtain  maximum  efficiency 
depends  on  Nt.  If  only  a  few  points  are  to  be  sampled,  it  is  not  worthwhile  to 
use  many  function  evaluations  in  the  partitioning  phase.  On  the  other  hand, 
a  substantial  investment  in  the  partitioning  will  be  worthwhile  if  a  great  many 
points  are  to  be  sampled. 
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In  order  to  illustrate  the  effectiveness  of  this  procedure,  we  present  in  Table 
1  the  results  of  applying  it  to  several  problems.  For  each  example,  we  give  the 
simple  acceptance/rejection  efficiency  e*  (3),  the  value  of  Np  and  the  resulting 
number  of  subregions,  and  the  efficiency  era  (8).  The  number  Nv  was  taken  as  a 
“reasonable”  value;  as  noted  above,  the  best  choice  of  Np  depends  on  the  number 
of  points  to  be  sampled. 

For  each  example,  the  number  of  sample  points  was  computed  for  which  the 
work  saved  by  using  the  partitioning  method  is  equal  to  the  work  necessary  to 
construct  the  subregions.  This  “crossover”  value  Ne  is  given  by 

N  - 

’  1/e*- 1/e,,' 

Thus,  for  the  first  example  of  Table  1,  allowing  250,000  function  evaluations  for 
the  partitioning  would  yield  a  sampling  efficiency  ttB  «  1.0,  with  a  corresponding 
crossover  value  of  Nc  —  501. 

As  the  results  indicate,  dramatic  improvements  in  sampling  efficiency  can 
be  achieved  by  applying  the  proposed  adaptive  procedure. 


References 

Everett,  C.  J.  and  Cashwell,  E.  D.  (1972).  A  Monte  Carlo  sampler,  Report 
LA-5061-MS,  Los  Alamos  Scientific  Laboratory,  New  Mexico. 

Friedman,  J.  H.  and  Wright,  M.  H.  (1981a).  A  Nested  Partitioning  Procedure 
for  Numerical  Multiple  Integration,  ACM  Tt&ns.  Math.  Software,  vol.  7, 
pp.  76-92. 

Friedman,  J.  H.  and  Wright,  M.  H.  (1981b).  DIVONNE4:  A  Program  for  Multiple 
Integration  and  Adaptive  Importance  Sampling,  CGTM  193,  Stanford  Linear 
Accelerator  Center. 


UNCLASSIFIED 


SECURITY  CL  AM<  FI  CATION  or  THU  PAPE  ?WN—  Pt^EiH«— 


REPORT  DOCUMENTATION  PAGE 

READ  INSTRUCTIONS 

BEFORE  COMPLETING  FORM 

17  REPORT  number  1.  OOVT  ACCESSION  MO. 

SOL  81-23  ft'D-yj  i  10  tfC 

1.  RECIPIENT'S  CATALOG  NUMBER 

*.  TITLEfaMMWIi) 

AN  ADAPTIVE  IMPORTANCE  SAMPLING  PROCEDURE 

*.  TYPE  OF  REPORT  B  PERIOD  COVERED 

Technical  Report 

PERFORMING  OBO.  REPORT  NUMBER 

7.  AUTHOR? •) 

Jerome  H.  Friedman  and  Margaret  H.  Wright 

B.  CONTRACT  oh  GRANT  NUMBER?*) 

N00014-75-C-0267 

DAAG29-8 1 -K-0 1 56 

PERFORMING  ORGANIZATION  NAME  AMO  ADDRESS 

Department  of  Operations  Research  -  SOL 

Stanford  University 

Stanford,  CA  94305 

IB.  PROGRAM  ELEMENT.  PROJECT.  TASK 
AREA  k  WORK  UNIT  NUMBERS 

NR-047- 143 

1 1.  CONTROLLING  OFFICE  NAME  AND  ADDRESS 

Office  of  Naval  Research  -  Dept,  of  the  Navy 

800  N.  Quincy  Street 

Arlington,  VA  22217 

It.  REPORT  DATE 

November  1981 

IS.  NUMBER  OF  PAGES 

9 

U.  MONITORING  AGENCY  name  •  ADDRESS???  Mllmrml  from  Controlling  Ollleo) 

U.S.  Army  Research  Office 

P.0.  Box  12211 

Research  Triangle  Park,  NC  27709 

IS.  SECURITY  CLASS.  ?of  this  roport) 

UNCLASSIFIED 

IS  a.  DECLASSIFIC  ATI  ON?  DOWN  GRADING 
SCHEDULE 

1«.  DISTRIBUTION  STATEMENT  (of  thfo  Rmport) 

This  document  has  been  approved  for  public  release  and  sale; 

Its  distribution  Is  unlimited. 

IT.  DISTRIBUTION  STATEMENT  ?o?  Hit  abstract  ontorod  In  Bleak  SO,  If  dlllooonl  tnm  RapartJ 

IS.  SUPPLEMENTARY  NOTES 

THE  VIEW,  OPINIONS.  ANO'Cn  FP.-HNOn  CONTA.rn  IN  TH-nnEPom 

.1  l2* 

CIS  ION,  UNLESS  SO  DLolOliAiLJ  LY  Oirt-N  uo  . "" 

It.  KEY  WOROS  (ConHnuo  on  foroooo  oldo  It  noeooooof  ond  Idonllfy  Of  Mack  mwmOor) 

sampling  probability  distributions, 

Mpnte  Carlo  procedures  adaptive  partitioning 

/ 

/ 

/ 

tt.  ABSTRACT  (Conllnuo  on  rororoo  olgo  II  noooooory  ong  Idomhtf  Of  Hook  —Mr) 

Monte  Carlo  calculations  often  require  generation  of  a  random  'sample  of 
n-dimensional  points  drawn  from  a  specified  multivariate  probability 
distribution.  We  present  an  importance  sampling  technique  that  can  often 

greatly  improve  the  efficiency  of  an  acceptance/rejection  generating  method. 
The  importance  sampling  function  is  defined  as  piecewise  constant  on  a  set  of 
subregions,  which  are  obtained  by  adaptively  partitioning  the  sampling  region  so 
that  the  variation  of  density  values  within  each  subregion  is  relatively  small. 
The  partitioning  strategy  is  based  on  multiparameter  optimization  to  estimate 
..the  maximum  and  minimum  of  the  original  densltv  function  in  each  subreaion. 

00  I  MM*n  1473  «WT10H  OF  I  NOV  ••  IS  oosocirc  /~\ 


SECURITY  CLASSIFICATION  OS  THIS  P  AOE  <w»m>  In 


